Systems and methods for identifying and analyzing internet users

ABSTRACT

This disclosure describes systems, methods, and apparatus for generating reports enhancing an understanding of Internet users based on their generated content and actions taken by others in response to the generated content.

CLAIM OF PRIORITY UNDER 35 U.S.C. §119

This application is a continuation of U.S. patent application Ser. No.14/922,835 filed Oct. 26, 2015 and entitled “SYSTEMS AND METHODS FORIDENTIFYING AND ANALYZING INTERNET USERS” which is a continuation ofU.S. patent application Ser. No. 13/773,165 filed Feb. 21, 2013 andentitled “SYSTEMS AND METHODS FOR IDENTIFYING AND ANALYZING INTERNETUSERS,” which claims priority to Provisional Application No. 61/601,215entitled “SOCIAL MARKETING PLATFORM” filed Feb. 21, 2012, andProvisional Application No. 61/719,307 entitled “SYSTEMS AND METHODS FORIDENTIFYING AND ANALYZING INTERNET USERS” filed Oct. 26, 2012, theentire disclosures of which are hereby incorporated by reference for allproper purposes, as if fully set forth herein. This application is alsoa continuation of U.S. patent application Ser. No. 14/922,845 filed Oct.26, 2015 and entitled “SYSTEMS AND METHODS FOR IDENTIFYING AND ANALYZINGINTERNET USERS” which is a continuation of U.S. patent application Ser.No. 13/773,165 filed Feb. 21, 2013 and entitled “SYSTEMS AND METHODS FORIDENTIFYING AND ANALYZING INTERNET USERS,” which claims priority toProvisional Application No. 61/601,215 entitled “SOCIAL MARKETINGPLATFORM” filed Feb. 21, 2012, and Provisional Application No.61/719,307 entitled “SYSTEMS AND METHODS FOR IDENTIFYING AND ANALYZINGINTERNET USERS” filed Oct. 26, 2012, the entire disclosures of which arehereby incorporated by reference for all proper purposes, as if fullyset forth herein.

FIELD OF THE DISCLOSURE

The present disclosure relates generally to online data analysis. Inparticular, but not by way of limitation, the present disclosure relatesto systems, methods and apparatuses for analyzing Internet users.

BACKGROUND

Many attempts have been made to better understand Internet users, oftenfor marketing purposes. However, these attempts often look at evidencesuch as web page visits, which only provide an ability to infer what isgoing on within a user's mind. Most attempts have not looked at ways todirectly monitor Internet user beliefs. Those that have are plagued bythe challenges of collecting and analyzing enormous data sets.

For instance, social influence, or the capacity to affect others'character, development, or behavior, is subjectively analyzed via manualanalysis of online content and manual associations of content with userprofiles. Some current methods enable small numbers of influential usersto be identified; however, the manual nature of these methods preventsthem from being scaled into the tens and hundreds of millions. Othersolutions use crowdsourcing or curating to partially overcome thescalability issues associated with manual solutions to these largeanalysis challenges (e.g., KLOUT and KRED).

SUMMARY OF THE DISCLOSURE

Exemplary embodiments of the present invention that are shown in thedrawings are summarized below. These and other embodiments are morefully described in the Detailed Description section. It is to beunderstood, however, that there is no intention to limit the inventionto the forms described in this Summary of the Invention or in theDetailed Description. One skilled in the art can recognize that thereare numerous modifications, equivalents and alternative constructionsthat fall within the spirit and scope of the invention as expressed inthe claims.

Some embodiments of the disclosure may be characterized as a serversystem comprising a network interface, a memory, and a processor. Thenetwork interface can receive product, service, or customer data from aclient, receives a query from the client, and returns a report of socialprofiles to the client. The memory can store a searchable social profiledatastore having one or more social profiles. The processor can run anAPI, a crawler module, a parser, an analysis module, and a scoringmodule. The API can receive the product, service, or customer data fromthe client via the network interface, can receive a query from theclient via the network interface, and can return a report of socialprofiles to the client in response to the query via the interface. Thecrawler module can collect content and raw data from the Internet basedon the product, service, or customer data. The parser can parse thecontent and raw data into terms. The analysis module can compute one ormore of the following for each term: a reach value, a relevance value,and an impact value. The scoring module can compute scores for one ormore social profiles for each term based on the one or more of reachvalue, relevance value, and impact value. The scoring module can furtheradd to or update the one or more social profiles with the scores.

Other embodiments of the disclosure may also be characterized as amethod for generating reports enhancing an understanding of Internetusers based on their generated content and actions taken by others inresponse to the generate content. The method can include collectingcontent or other raw data via a crawler module that accesses webpagesfrom a network interface of a server system. The method can furtherinclude associating the content or other raw data with a social profileresiding in or being added to a memory. The method can yet furtherinclude calculating scores the social profile based on terms parsed fromthe content or other raw data. The method can yet further includereceiving a query, via the network interface, for users fitting one ormore contexts. The method yet further can include identifying the usersfitting the one or more contexts. The method can also include returninga report in response to the query and transmitted through the networkinterface, having the users.

BRIEF DESCRIPTION OF THE DRAWINGS

Various objects and advantages and a more complete understanding of thepresent invention are apparent and more readily appreciated by referringto the following detailed description and to the appended claims whentaken in conjunction with the accompanying drawings:

FIG. 1 illustrates one embodiment of a system that collects content orother raw data, associates the content with one or more social profilesand/or creates new profiles, and returns reports showing contextualscores and/or rankings for one or more users in one or more contexts;

FIG. 2 illustrates another embodiment of a system that collects contentor other raw data, associates the content or other raw data with one ormore social profiles and/or creates new profiles, and returns reportsshowing contextual scores for one or more users in one or more contexts;

FIG. 3 illustrates yet another system that collects content, associatesthe content with one or more social profiles and/or creates newprofiles, and returns reports showing scores for one or more users inone or more contexts;

FIG. 4 illustrates a graph datastore having nodes (users) and edges(relationships) where new nodes and new edges (e.g., nodes 13, 17, 18,and 19 and related new edges) are added to the graph datastore as aresult of newly-crawled content;

FIG. 5 illustrates the graph datastore of FIG. 4, but with the additionof a new edge between nodes;

FIG. 6 illustrates the graph datastore of FIG. 5, but with the merger oftwo nodes;

FIG. 7 illustrates the directionality of relationships between nodes;

FIG. 8 illustrates a portion of a graph datastore showing some nodes andedges between those nodes;

FIG. 9 illustrates one embodiment of a method for generating the newtype of data;

FIG. 10 illustrates a method for generating reports enhancing anunderstanding of Internet users based on their generated content andactions taken in response to the generated content;

FIG. 11 illustrates another method for generating reports enhancing anunderstanding of Internet users based on their generated content andactions taken in response to the generated content;

FIG. 12 illustrates one arrangement of logical components and functionsfor marketing based on at least social media content;

FIG. 13 illustrates another arrangement of logical components andfunctions for marketing based on at least social media content; and

FIG. 14 illustrates a computing system configured to carry out themethods and to interact with the systems and apparatus described in thisdisclosure.

DETAILED DESCRIPTION

The word “exemplary” is used herein to mean “serving as an example,instance, or illustration.” Any embodiment described herein as“exemplary” is not necessarily to be construed as preferred oradvantageous over other embodiments.

For the purposes of this disclosure, “influence” represents a socialinfluence of an Internet user expressed over the Internet.

For the purposes of this disclosure, a “term” is a single word parsedfrom raw data collected by a crawler module. A “context” is a categorythat describes a combination of one or more terms. Terms can bedescribed by different contexts depending on how they are combined withother terms (e.g., “compact cars” v. “maintenance for cars”). In someembodiments, a context comprises exact terms or combinations of terms,while in other embodiments, contexts can describe one or more termswithout using the exact term(s). In one embodiment, a single context isa stem (e.g., “swim” is a term defining the context for any contentincluding the words “swim”, “swimming”, “swam”, etc.).

For the purposes of this disclosure, “contextual influence” is theinfluence between users in a group where inclusion in the group isdetermined by context. Assuming a context can include up to six words,there around 10³⁰ different contexts in the English language and eachuser can be part of one or more of those contexts and can exertdifferent influence in each context.

There are numerous analytics platforms known that are configured toanalyze Internet users and generate data and reports on those Internetusers. However, those platforms typically look at data such as webactivity and purchasing activities (e.g., via credit card histories).The herein disclosed systems, methods, and apparatus diverge from theart by producing data (e.g., reports) on Internet users that enhance anunderstanding of those users based on content created by those users,and actions taken by users relative to the created content. Inparticular, scores for reach, relevance, and impact in a variety ofcontexts, can be generated and used to derive various scores that helpenhance an understanding of Internet users based on their content andactions taken by others in response to that content. Other scores can beused in addition to, or as alternatives to, any one or more of reach,relevance, and impact. This data can be used to supplement traditionalanalytics and data such as credit card purchase histories.

This new type of data can be used, for instance, to help betterunderstand which Internet users have the most influence on otherInternet users, which is particularly important to marketing entities.As another example, the systems, methods, and apparatus can also be usedto predict an Internet user's propensity to take an action such as tobuy a product. The new type of data can also be used to flesh out userprofiles that a company such as NIKON or AMAZON already have built andpopulated. This disclosure focuses on the systems, methods, andapparatus used to generate the new type of data, and leaves to otherdiscussions, methods for using the data.

FIG. 9 illustrates one embodiment of a method for generating the newtype of data. First, content or other raw data can be collected in anoperation 902 (e.g., via web crawler) and then associated with socialprofiles in an operation 904. Scores can be calculated for each socialprofile for each of one or more terms parsed from the collected contentin an operation 906. The scores can be reported for certain contextswhere a context describes a combination of one or more of the termsparsed from the collected content. This reporting can be done inoperation 908.

The content can include webpages and any subcomponents of a webpage.TWITTER tweets, YOUTUBE comments, YOUTUBE tags, blog posts, forum postsand comments, and social media public profiles, are just some examplesof content that can be collected in the collect operation 902. In somecases, user actions can also be collected. For instance, the actions ofretweeting or commenting on a YOUTUBE video are actions that can becollected in the collect operation 902.

Scores can be determined for any or all terms parsed from the collectedcontent. For instance, where a tweet says, “just bought Nikon D600 andhigh ISO performance is amazing,” the parsed terms may include “Nikon,”“D600,” and “ISO.” Scores may be assigned to the author of the tweet foreach of “Nikon,” “D600,” and “ISO.” The scores can include contextualinfluence scores, or propensity to buy scores, to name two non-limitingexamples.

The reports can then provide contextual scores that are based on thescores assigned to each term parsed from the collected content. Forinstance, and recalling the NIKON tweet example, a report could includea score for a context called “Nikon cameras” and another for a contextcalled “enthusiast cameras.” An alternative context could simply be“Nikon D600.” Continuing with the influence and propensity to buyexamples, the reports can include lists of top Internet users in termsof contextual influence scores or in terms of propensity to buy scoresfor one or more contexts. Reports can also include lists of Internetusers that fit particular contexts (e.g., travel, automotive, homefurnishings). Often, the context(s) is selected by or provided by amarketing or retailing entity. In some cases, such entities are lookingto enhance knowledge of existing or potential customers.

Reports can be automatically generated, although in some cases, reportscan result from a specific query for Internet users of one or morecontexts. For instance, NIKON may provide e-mail addresses for existingcustomers and potential customers. Content is then collected andassociated with social profiles that are created or that existed for theexisting and potential customers, and scores are calculated for eachsocial profile for each of one or more terms parsed from the collectedcontent. A report can then be presented to NIKON providing contextualscores for the existing and potential customers in the one or morecontexts (e.g., “entry-level cameras”). After receiving the report,NIKON may realize that it wants further information on the existing andpotential customers, but for different contexts (e.g., “enthusiastcameras” and “professional-level cameras”). So, NIKON may make a queryfor a report showing top influencers given the contexts “enthusiastcameras” and “professional-level cameras.” Another report can bereturned showing top influencers based on only those scores for termsthat are found to match these two contexts (e.g., terms like “D600” andcombinations of terms like “full-frame cameras” and “pro camera”).

The scores can be determined for one or more influencers based on one ormore of the following three values: “reach,” “relevance,” and “impact.”Each of the three scores is calculated relative to the terms parsed fromthe collected content in operation 902 and further for each Internetuser in a set of Internet users (e.g., those associated with e-mailaddresses provided by a marketing entity or retailer). In the case ofdetermining an influence of an Internet user, the higher the reach,relevance, and impact, the more likely they are to be influential onother Internet users.

For the purposes of this disclosure, “reach” represents how connected auser is to other users relative to a term. Reach represents both anumber of relationships that a user has to other users as well as aquality of those relationships. In some embodiments, reach can becalculated from a graph that includes nodes connected by edges, whereeach node represents a user and each edge represents a relationshipbetween two users (or nodes). The graph can be used to calculate reachscores for each user, for instance by counting a number of nodes that auser is connected to.

For the purposes of this disclosure, “relevance” indicates how germane auser's content is to a term. For instance, given the terms, “camera”, auser who publishes content about photography will often have greaterrelevance than a user who publishes content about French Film History.Relevance is based upon a quantity of content generated over a period oftime that is germane to a term and how relevant each piece of content isto the term.

For the purposes of this disclosure, “impact” indicates how muchmeasurable action a user's content causes relative to a term. This canbe measured by analyzing the effects of a user's content relative to aterm, for instance, by analyzing retweets, shares, comments, likes,links, etc. that also mention the same term.

FIG. 1 illustrates one embodiment of a system that collects content orother raw data, associates the content with one or more social profilesand/or creates new profiles, and returns reports showing contextualscores and/or rankings for one or more users in one or more contexts.The system 100 includes a client side and a server side. The client sidecan provide or select seed information for a crawler module 104, such asproduct or service descriptions or customer data (e.g., e-mail addressesfor customers and potential customers), and the server side can return areport 126 providing information about Internet users based on analysisof content pulled by the crawler module 104. For instance, the report126 can list users who are most influential relative to a certain typeof product or a list of users most likely to buy a certain type ofproduct. The client side can also optionally make a query 124 that leadsto generation of one or more further reports 126 based on the query 124.

In particular, a client on the client side can provide or select productdescriptions, service descriptions, and/or customer data 102. The clientside can include a remote computer operated by a seller of goods and/orservices or a marketing entity. The client side can either select theproduct/service/customer data 102 from selection options presented bythe server side through an API 122, or the client side can provide thisdata 102 when prompted to by the API 122. For instance, the client sidemay be presented with the terms “shoes,” “food,” and “smartphones,” andthe operator of the client side may select “shoes.” The data 102 is thenused to guide a crawler module 104 which collects content (e.g.,webpages) and other raw data (e.g., metadata). For instance, if aproduct selection for “sleds” is made to the API 122, then the crawlermodule 104 may be directed to crawl for webpages where users havementioned the words, “sleds,” “sledding,” or “snow.”

In some embodiments, the server side can use the data 102 to determineif there are sufficient social profiles in a searchable social profiledatastore 118 to meet the needs of the client side. If there are, then areport 126 can be returned to the client side without further serverside action, and/or the client side can be prompted to submit a query124. If it is found that there are insufficient social profiles in thesearchable social profile datastore 118, then a crawler module 104 cancrawl a plurality of universal resource locators (URLs) and collectcontent and other raw data. The URLs can be generated based on theproduct, service, and/or customer data 102.

The crawler module 104 can return content (e.g., webpages) or other rawdata from the URLs that are crawled. The parser 106 can then parse theraw data and pass this parsed data (e.g., terms and embedded links) toan analysis module 108, which can use the parsed data to compute valuessuch as, but not limited to, reach, relevance, and impact. The resultsof the analysis module 108 can be passed to a scoring module 104 thatdetermines scores for each social profile for each term parsed by theparser 106. For instance, the scoring module 104 can determinecontextual influence scores.

The values from the analysis module 108 and the scores from the scoringmodule 110 are then used to populate social profiles in the searchablesocial profile datastore 118, where each value and score is associatedwith a social profile and each social profile includes values and scoresfor all the terms parsed by the parser 106. Sometimes, a new socialprofile has to be created, before values and scores can be added to theprofile.

Before, in parallel to, or after the analysis module 108 and the scoringmodule 110 determine values and scores, respectively, the analysismodule 108 can create new profiles in the searchable social profiledatastore 118, populate the new profiles, and update existing profiles.Populating and updating can include adding or modifying values, scores,and content or other raw data of a social profile. Additionally, aprofile enrichment module 120 can determine actual names to assign toeach social profile. Once the social profiles are populated or updatedand actual names are assigned to each profile, the social profiles areready to be used in determining contextual scores and generating thereport 126 and responding to the optional query 124.

The following provides further details of the various components of thesystem 100. The client side can involve all components and processescarried out by or on a client computing device such as a remote webbrowser. The server side can include all components and processescarried out by or on a server or set of servers providing services tothe client side.

Customer data can include records for existing or potential customersincluding fields such as e-mail address, handles for customers, or URLsof webpages where the customer has created content, demographics,interests, and buying habits/trends, to name a few. Recency, frequency,and monetary value of customer transactions are other examples ofcustomer data.

The crawler module 104 can be embodied in a web crawler such as, but notlimited to, Apache Nutch. The crawler module 104 can access a pluralityof webpages and create a copy of each visited webpage. The copies can beprovided to the parser 106 and stored in the crawler module 104.

The crawler module 104 can operate via a single computing device or aset of computing devices operating in parallel thus enabling fastercrawling. The crawler module 104 can crawl most publicly-accessible webpages. While the crawler module 104 can be seeded by the data 102, itcan also continually run, or run in the background, in order to expandthe searchable social profile datastore 118. In some embodiments, theremay be feedback instructions from one or more of the mentioned modulesor the searchable social profile datastore 118 guiding the crawlermodule 104 on further crawling tasks.

When the system 100 is in early operation and the searchable socialprofile datastore 118 is not very large, the client side will typicallyhave to wait longer for a report 126 since the crawler module 104 willhave to collect content or other raw data in order to generatesufficient social profiles to generate a useful report 126. Over time,the searchable social profile datastore 118 will expand and client siderequests will be met in shorter periods of time since the system 100 cangenerate reports 126 based on the existing searchable social profiledatastore 118, rather than having to expand the datastore 118 inresponse to a client side request.

Additionally, the parser 106 can extract information from content orother raw data provided by the crawler module 104 such as text that isvisible on a webpage (e.g., comments, blog posts, public profiles) andmetadata (e.g., WORDPRESS embeds profile information into links from aWORDPRESS webpage). This content or other raw data can then be passed tothe analysis module 108.

The searchable social profile datastore 118 can include fields such asreal name, location, contact information, demographics, and tastes, toname a few. It can also include values from the analysis module 108 andscores from the scoring module 110, where both can be mapped tocorresponding social profiles. These social profiles can also includecontent and other raw data from the crawler module 104 and/or the parser106.

Turning to the report 126, this may also include ordered lists or setsof users that fit within certain contexts or categories, oralternatively, the report 126 may include contexts or categories thatare associated with certain users. The report 126 can include one ormore of the following data fields: name; scores (e.g., contextualinfluence, propensity to purchase product X); reach, relevance, andimpact; recent searches where the user has been discovered; and URLswhere the user has published content. Additionally, a report 126 mayinclude a quantitative assessment of a confidence that the system 100has in the placement of a user into a context or category in the report126. The report 126 may also include any other scores or values thatwere calculated, such as reach, relevance, and impact.

FIG. 2 illustrates another embodiment of a system that collects contentor other raw data, associates the content or other raw data with one ormore social profiles and/or creates new profiles, and returns reportsshowing contextual scores for one or more users in one or more contexts.The system 200 operates similarly to that of system 100 in that a clientside provides or selects descriptions of products, services, and/orcustomer data 202 via an API 222 that seeds a crawler module 204. Here,there is also a URL generator 209 that generates URLs from the data 202,and passes the URLs to the crawler module 204. The crawler module 204again obtains content and other raw data, and a parser 206 parses whatthe crawler module 204 returns. In some instances, the parser 206 canpass instructions and URLs back to the crawler module 204 seedingfurther crawls. However, instead of an analysis module 108, here thereare modules for computing reach, relevance, and impact (a compute reachmodule 210, a compute relevance module 212, and a compute impact module214, respectively).

There is also a graph builder module 208 for building a graph datastorethat is used to populate profiles in the searchable social profiledatastore 218 which maps reach, relevance, impact, and content and otherraw data to users and their social profiles as well as relationships andrelationship metadata between the profiles. A profile enrichment module220 enhances profiles in the searchable social profile datastore 218,for instance, by determining actual names to associate with each socialprofile. The graph datastore can also be used to compute reach. Thereach, relevance, and impact values can be passed to a scoring module216 which determines scores that can be saved in a searchable socialprofile datastore 218. One or more reports 226 can then be generatedfrom the searchable social profile datastore 218 and returned to theclient side via the API 222. The reports 226 can be based on contextualscores calculated from the scores in the searchable social profiledatastore 218. The client side can again optionally make queries 224 viathe API 222.

The following discusses the various components of the system 200 ingreater depth. As noted above, the parser 206 can send information backto the crawler module 204 to guide the crawler module 204 in furthercrawls. This information may include URLs that are found within contentor other raw data obtained by the crawler module 204 or may be URLsgenerated from data or metadata in the content or other raw dataobtained by the crawler module 204. For instance, the crawler module 204may return a YOUTUBE comment that references a link to a product review,and this link can be returned as a URL to the crawler module 204 forfurther crawling. In some embodiments, the parser 206 parses terms fromthe content collected by the crawler module 204 as well as monitors termfrequency. Both the parsed terms and term frequency can be stored in thesearchable social profile database 218.

The parser 206 passes parsed content and other raw data to the graphbuilder module 208. The graph builder module 208 builds a graphdatastore (e.g., see FIG. 4) comprising nodes (users) and edges(relationships between users) and stores the graph datastore in thesearchable social profile database 218. Initially, a node is created foreach crawled URL, but later, the graph builder module 208 merges nodesthat it deems are attributable to the same user. For instance, where twoblog posts both contain ‘melinks’ to the same user, the nodesrepresenting the two blog posts can be merged.

The graph builder module 208 adds nodes and edges to the graph datastorewhenever the crawler module 204 returns content or other raw data. Forinstance, FIG. 4 illustrates a graph datastore having nodes (users) andedges (relationships) where new nodes and new edges (e.g., nodes 13, 17,18, and 19 and related new edges) are added to the graph datastore as aresult of newly-crawled content. A relationship can be evidenced by avariety of factors such as actions (e.g., referencing another TWITTERusername in a tweet). Multiple actions can give rise to a single edge.In some cases, newly-crawled content may merely lead to the addition ofa new edge, for instance, as illustrated in FIG. 5 between nodes 7 and11.

Where two nodes are recognized as being associated with content that wascreated by the same user, the two nodes can be merged, as is illustratedin FIG. 6, where nodes 1 and 4 have been combined into node 1-4.Determining whether to merge two nodes can be performed by analyzingcommon attributes between nodes. Commonality between edges is anotherfactor that can be used to determine if nodes should be merged. Forinstance, even where not all attributes are the same for two nodes, ifall or most edges are common for two nodes, then it is likely that thetwo nodes are the same and should be merged. For instance, in FIG. 8,nodes 1 and 2 may be merged since their edges connect to the same threenodes. Node merger also entails merger of social profiles in some casessince newly-crawled webpages may lead to the discovery that twopreviously-created nodes, and their corresponding social profiles,should be merged.

Node creation can be cautious—a node is created for every crawled URLregardless of authorship; node merger is less cautious—nodes are mergedwhenever a threshold criterion for merger is met.

Separate from node merger, is the task of associating real user nameswith nodes and their associated social profiles. Actual user names(e.g., “John Smith”) are metadata that are parsed from webpages andstored in the searchable social profile datastore 218. The profileenrichment module 220 can be configured to determine proper names foreach social profile. While some content is easily associated with auser's actual name, such as a tweet, other content is harder to tie to auser's actual name. For instance, a blog post written under a pseudonymor forum comments under arbitrary usernames. In some cases, there may bemore than one author for a single piece of content, and hence twoauthors may be assigned to a single node. Other difficult situationsinclude ones where there are two or more authors of a piece of content,and one author has an existing social profile while the other does not.The profile enrichment module 220 can utilize a ‘best guess’ algorithmto determine an actual name most likely to be associated with eachsocial profile.

Once the graph datastore has been created, the compute reach module 210reads the graph datastore from the searchable social profile datastore218 and determines a reach value for each node relative to each termparsed by the parser 206 for the URL that underpins each node. Reachvalues can then be passed to the scoring module 216 and stored in thesearchable social profile datastore 218. Reach represents a size andstrength of a user's social graph for a given term. There are a varietyof ways to compute reach. For instance, reach can be a function ofunique pathways (a series of one or more consecutive edges) between anode and every other node in the portion of the graph datastore beinganalyzed. In other words, each pathway is assigned a score and the reachfor a node is a function of the scores for each pathway to or from thenode.

In another embodiment, reach may be based on a number of unique pathwaysbetween a node and each other known node, where only a single pathwaybetween any two nodes is considered. For instance, even though there areat least two unique pathways between node 1 and node 7 (1-7; 1-6-7), thereach score may only be based on one of these pathways (e.g., theshortest pathway).

Each pathway can have different effects on the reach value based on thequality of the relationships that the pathway represents. For instance,reach may be inversely related to a number of edges in a pathway sincemore edges mean greater degrees of separation between users. Individualedges can also have a quality, which may be reflected in a score orweight applied to each edge. For instance, a retweet may carry lessweight than an action of following another user on TWITTER. Variousother algorithms can also be used to determine a quality ofrelationships between nodes (e.g., the effect that a unique pathwaybetween the two nodes has on the reach score for those nodes). Thus,score for a pathway may be a weighted sum of the edges in a pathway, andin this way can reflect both degrees of separation between users as wellas a quality of the relationships connecting users.

In one embodiment, an all-pairs shortest path algorithm can be used, todetermine a shortest pathway between nodes in a graph datastore. Theall-pairs shortest path algorithm determines whether there is a pathwaybetween any two nodes and if so, determines what the minimum number ofedges between those two nodes is. Traditional methods of determiningdistances between nodes in a changing graph perform an all-pairsshortest path calculation (or similar distance-measurement) after everyedge is added to the graph. This causes numerous problems that have notbeen solved in the art. This disclosure uses a method whereby aplurality of edges are added to the graph and then an iterativeall-pairs shortest path algorithm is executed. In this way, reach scoresfor nodes are updated after a plurality of nodes and/or edges have beenadded to the graph, rather than updating reach scores after every nodeand/or edge is added. This enables the scalability of the system 200that some in the industry (e.g., GOOGLE) have said was not practical.

In some embodiments, relationships between nodes can be directional asshown in FIG. 7. For instance, following someone on TWITTER is a one-wayrelationship. However, when having contacts on LINKEDIN requiresreciprocation from another user, this is a two-way relationship. Reachscores can be affected by the directionality of the relationshipsbetween users. For instance, pathways between nodes may only exist wherethere is a continuous series of edges in the same direction.

Reach can also be calculated for different terms for each user. Thus,reach values may differ for different terms given the same graphdatastore. In other embodiments, a pathway may only be considered in areach score if all, or some threshold number, of nodes in the pathwayare associated with the relevant term.

In some embodiments, reach can be calculated for each user relative to aportion of or relative to all nodes. For example, where computingresources are to be saved, reach may only be calculated based onpathways having less than a threshold number of edges. In anotherexample, the portion of all nodes can include only those nodes for acustomer set provided by the client side (e.g., those nodescorresponding to a list of e-mail addresses for existing customers ofthe operator of the client side).

The compute relevance module 212 computes a relevance score and passesthis to the searchable social profile datastore 218. In someembodiments, an inverted index of user-created content can be mined toascertain relevance.

A compute impact module 214 takes the parsed content and other raw dataand calculates impact values which are passed to the searchable socialprofile datastore 218. Impact can be determined by calling APIs ofcontent sharing sources (e.g., DISQUS, TWITTER) that return dataindicating how many actions have been taken in response to publishedcontent. The API calls can be carried out via the compute impact module214. The API calls may also return identifications of the users who takeactions in response to published content and/or what the actions were.For instance, an API call to TWITTER may return lists of users whoretweeted a tweet, and an API call to DISQUS may return lists of userswho are active within a forum. The APIs may also return timing data foreach action taken relative to a piece of content. The effect of actionson the impact score can be weighted by a time that elapsed between thecontent creation and the responding action.

The types of actions taken in response to a piece of content can becategorized and weighted. For instance, when a first user writes about asecond user's blog post, that action may carry greater weight indetermining impact than a third user's mere viewing of the second user'sblog post. In some cases, actions taken in response to a piece ofcontent may only include other pieces of content rather than all typesof actions.

In some embodiments, impact can look not only at a quality and quantityof actions taken in response to a piece of content, but also therelationship between the user who created the content and the user whotook action in response thereto. For instance, an algorithm may givegreat weight to users who take action in response to content and are farremoved from the author (e.g., one's influence is likely greater whennon-friends and family respond to content than when mere friends andfamily respond). Thus, the weight assigned to each action taken inresponse to a piece of content towards the impact score may be based inpart on a number of edges between the node representing a user thatauthored the content and the node representing a user that responded tothe content. This algorithm can alternatively be incorporated into thescoring module 216 and thus used during computation of score rather thanduring computation of impact value.

By making these API calls and receiving data in return, the computeimpact module 214 collects a set of data that can be analyzed todetermine what impact a piece of content had on other users. In someembodiments, a depository of content and metadata can be mined tocalculate impact (see, e.g., a content metadata datastore 340 in FIG.3).

Reach, relevance, and impact (and/or other scores) are passed to thescoring module 216 where they are used to determine a score (e.g.,influence or propensity to buy) for each of a plurality of nodes andeach of a plurality of terms. The scores are stored in the searchablesocial profile datastore 218 and mapped to corresponding socialprofiles. The searchable social profile database 218 thus comprises thegraph datastore, where each node of the graph datastore includes, or ismapped to, a social profile that can include any one or more of:metadata, scores, reach value, relevance value, impact value,demographics, contact information, name, geography, social handles,outlets, and other profile data.

The score module 216 can apply a variety of algorithms to calculatescores and apply various weights to each of reach, relevance, andimpact. For instance, the following: A*reach+B*relevance+C*impact, whereA, B, and C are weights. Another example is: B*reach*(A*reach+C*impact).These are just two examples of the multitude of algorithms that thescoring module 216 can use. Furthermore, they highlight the fact thatweights can be applied by the compute reach, relevance, and impactmodules 114, 116, 118 or by the scoring module 216. In an alternativeembodiment, weights for reach, relevance, and impact can be applied inthe compute reach module 210, compute relevance module 212, and thecompute impact module 214, respectively.

Scores can be calculated for each user and for each term parsed fromcontent generated by that user. For instance, each node may have300-1000 scores, where each score is determined for a different term. Insome cases, scores can be based on one or more of reach, relevance, andimpact. For instance influence scores can be calculated from all threeof these, while propensity to buy may only be calculated usingrelevance. When a query 224 is made for users in a given context, thesearchable social profile datastore 218 has scores precalculated forhundreds if not thousands of terms that can be used to calculatecontextual scores in response to the query 224.

After a first report 226 is provided to the client side, the client sidecan make an optional query 224 for users. Alternatively, the query 224can be passed to the API 222 along with the product descriptions,service descriptions, and/or customer data 202. The query 224 caninclude one or more contexts, where a context describes a combination ofone or more terms. The API 222 can take the one or more contexts andsearch the searchable social profile datastore 218 for all profileshaving scores for terms that match the one or more contexts of the query224, or search for a set of profiles in the queried context having thehighest scores. Contextual scores can be generated in response to thequery 224 and organized, along with corresponding social profiles, in areport 226 and returned via the API 222 to the client side.

The searchable social profile datastore 218 can also provide data to asystem control 250 that passes refresh instructions back to the crawlermodule 204. The refresh instructions can cause the crawler module 204 tocrawl URLs that it may have previously crawled, but where the contentand metadata from the last crawl has become stale.

FIG. 3 illustrates yet another system that collects content, associatesthe content with one or more social profiles and/or creates newprofiles, and returns reports showing scores for one or more users inone or more contexts. The components and function of system 300 aresimilar to that discussed in FIG. 2 except that here various componentshave replaced the searchable social profile database 218, and the system300 includes a content metadata datastore 340.

The graph datastore, content and other raw data, values, and scores arestored in the social profile datastore 318. A search optimization module342 prepares social profiles in the social profile datastore 318 foreither lookup or search index. In other words the search optimizationmodule 342 organizes and formats the social profiles to make it fasterand more accurate to perform lookups of contexts that certain users fitinto or searches for users that have high contextual scores. The socialprofile lookup index 344 comprises an index of contexts that areassociated with one or more users. The social profile search index 346comprises an index of users that are associated with one or more termsand their scores for each term. Therefore, the query 324 can either lookfor contexts that one or more users fit into, or look for one or moreusers having high contextual scores for selected contexts. The query 324calls the API 322, which then interacts with either the social profilelookup index 344 or the social profile search index 346 in order torespond to the query 324.

Here, relevance and impact are stored in the content metadata datastore340 and are accessed therein by the scoring module 316. The contentmetadata datastore 340 can also be a repository of content, raw data,relevance values, and impact values. The content and other raw data canbe called upon and analyzed for a period of time, but eventually becomestale. When this happens, a notice can be passed to the system control350 that instructs the crawler module 310 to recrawl certain URLs inorder to refresh the stale content or raw data. A mapping between thesocial profile datastore 318 and the content metadata datastore 340 canalso be included in the system 300.

FIG. 10 illustrates a method for generating reports enhancing anunderstanding of Internet users based on their generated content andactions taken by others in response to the generated content. The method1000 begins with a collect content or other raw data operation 1002 thatcollects content or other raw data via a crawler module such as crawlermodule 104. An associate content with a social profile operation 1004then associates content with a social profile that can be stored in asearchable social profile datastore 118 or a social profile datastore318 stored in a memory. The associate operation 1004 can be carried outvia a graph builder module such as graph builder module 208. A calculateoperation 1006 can then calculate scores for social profiles fordifferent terms, where the terms are parsed from the content or otherraw data. The calculate operation can be carried out via a combinationof modules such as the analysis module 108 and the scoring module 110 orthe compute reach module 210, the compute relevance module 212, thecompute impact module 214, and the scoring module 216. Each score can becalculated for a node of a graph datastore and for all terms parsed fromcontent or other raw data associated with the node.

The method 1000 then has two parallel or alternative paths. In the leftpath a first receive operation 1008 can receive a query for usersfitting certain contexts. For instance, the query (e.g., 124) may askfor top influencers fitting a context describing tablet computers. Themethod 1000 can then identify matching users via a first identifyoperation 1010, which may be carried out via the combination of an API,such as API 222 and a searchable social profile datastore such as 218.The first identify operation 1010 may also be carried out via acombination of an API, such as 322, and a social profile search indexsuch as 346. The method 1000 may then return a report with matchingusers in a return operation 1016.

In the right path a second receive operation 1012 that receives a queryfor contexts that certain users fit. For instance, the query may ask forcontexts that are associated with an e-mail address for an existing orpotential customer. A second identify matching contexts operation 1014can then identify contexts that are associated with a social profilematching the user specified in the query. This can be performed via acombination of an API, such as API 322 and a searchable social profiledatastore such as 118. The second identify operation 1014 may also becarried out via a combination of an API, such as 322, and a socialprofile lookup index such as 344. The method 1000 may then return areport with matching contexts in the return operation 1016. The returnoperation 1016 can also return both matching users and matching contextsif the query calls for both.

FIG. 11 illustrates another method for generating reports enhancing anunderstanding of Internet users based on their generated content andactions taken in response to the generated content. The method 1100 canbegin with either or both of a first receive operation 1102 or a secondreceive operation 1104. The first receive operation 1102 receives andcategorizes products and services from a client side in order to seed acrawler module. The second receive operation receives existing orpotential customer data to seed the crawler module 1104. In some cases,there may be sufficient users in a social profile database as determinedby a decision 1106, in which case the method 1100 can immediately turnto returning a report to the client side. As such, the method 1100 canjump to an optional receive query for social profiles or contextsoperation 1132 or jump straight to a return report with social profilesor contexts operation 1134. The optional receive query for socialprofiles or contexts operation 1132 can also follow the return reportoperation 1134 and can then be followed by another return reportoperation 1134.

Where the decision 1106 determines that the social profile database doesnot have sufficient users, the method 1100 can turn to a crawler moduleto crawl URLs in order to generate new users in crawl operation 1110 andalso crawl to further populate existing social profiles in crawloperation 1108. Note, that even where the decision 1106 is affirmative,the method 1100 can optionally also crawl to further populate the socialprofile database via operation 1108 and all subsequent operations asdiscussed below. The method 1100 then parses content or other raw datafrom the crawler module in a parse operation 1112.

The content or other raw data can then be associated with socialprofiles in an associate content operation 1114. Reach, relevance, andimpact values are then calculated via a compute reach values operation1116, a compute relevance values operation 1118, and a compute impactvalues operation 1120. A compute scores operation 1122 can then take thereach, relevance, and impact values and compute scores. In oneembodiment, scores can be computed for each node in a graph datastoreand for each term associated with each node.

Where a social profile does not exist for the content or other raw data,new social profiles can be created and populated via create and populatenew social profiles operation 1124. This can include determining anactual name to associate with the social profile, populating the profilewith scores and values, and mapping content and other raw data to theprofile. Where a social profile does exist it can be updated with thescores, values, and parsed content or raw data via an update existingsocial profiles operation 1126.

At this point, the method 1100 can execute optional operation 1132 andoperation 1134 as discussed above for situations where sufficient usersexisted in the social profile database according to decision 1106.

Social Marketing Platform

A further aspect of the disclosure describes systems, methods, andapparatus for marketing based on consumer data extracted from at leastsocial media content and Internet content. In particular, initialconsumer data can be accessed that includes at least an identifier of aconsumer and may include further consumer data such as the date, time,cost, and product of a recent purchase. Using this initial consumerdata, and in particular the consumer identifier, additional consumerdata can be accessed, for instance by purchasing additional consumerdata from a consumer data provider or by extracting additional consumerdata from social media (e.g., FACEBOOK profiles, TWITTER TWEETS, productreviews, to name a few) and various other sources of Internet content.

To make the combination of initial and additional consumer data moremanageable (the combination will be referred to as consumer data), asubset of the combined consumer data can be selected as data mostrelevant to identifying consumers likely to make another purchase and/orinfluence others to make a purchase. The subset of consumer data can beevaluated to generate derived fields such as in-market status andinfluence on other consumers. The subset of consumer data and thederived fields, can periodically be updated, for instance via a nightlyevaluation to update the derived fields. The subset of consumer data andthe derived fields either alone or in combination, can also be analyzedin a segmentation operation that places the consumers into segments. Thesegments can be used to recommend marketing strategies to clients or tosuggest groups of consumers that are best suited to receive furthermarketing content.

FIG. 12 illustrates one arrangement of logical components and functionsfor marketing based on social media content and other Internet content.Initially, consumer data can be acquired in acquire consumer dataoperation 1202. A subset of the consumer data 1222 can be selected inthe data selection operation 1204 and passed to a mart 1220 or to anevaluation operation 1206. Either way, the evaluation operation 1206takes the subset of consumer data 1222 and generates one or more derivedfields 1224 that can be added to or used to update existing derivedfields 1224. Segmentation 1208 can then use the derived fields 1224and/or the subset of consumer data 1222 to place consumers into segments1226. The segments 1226 can be used to trigger promotions or othermarketing methods in a trigger operation 1210. The segments 1226 can beprovided to marketing clients in a provide operation 1212 so that themarketing clients can use the segments to develop or modify theirmarketing strategies. The segments 1226 can also be used to makeindividual consumer suggestions in individual suggestion operation 1214so that one or more clients can tailor marketing strategies toindividual consumers.

Acquiring Consumer Data

The acquire consumer data operation 1202 can first include acquiringidentifying information regarding one or more consumers. The identifyinginformation (e.g., e-mail address, phone numbers, mailing address) canbe part of initial consumer data 1232 that also can include, forinstance, purchase history, demographics, and consumer affinity. Theidentifying information can be used to acquire additional consumer data,for instance via purchase from a consumer data supplier 1230 or viaextraction from the Internet 1228.

The initial consumer data 1232 can include an e-mail address or otheridentifying information for one or more consumers and purchaseattributes for each of the one or more consumers. Other examples ofconsumer data include, but are not limited to, demographics, behavior onsocial networks, consumer affinity, recency of a purchase, purchasefrequency, and monetary attributes of those purchases. These attributesmay also be characterized in relation to a given company, product line,organization, or other entity. The initial consumer data can describeone or a plurality of consumers. The initial consumer data may beprovided by a client. For instance, the initial consumer data may havebeen acquired by a client as the result of making prior sales to anumber of customers. Thus, the client provides the initial customer dataas part of a request for suggested consumers best suited for marketing.

In one embodiment, the identifying information for one or more consumerscan be provided to a consumer data supplier 1230 who then associatesadditional consumer data with the consumers identified by theidentifying information and provides this additional consumer data inexchange for consideration (e.g., money).

In another embodiment, the identifying information for one or moreconsumers can be used to extract the additional information from theInternet 1228. For instance, an e-mail address, residential address,telephone number, or some other identifying information can be used toidentify social media 1236 content or social media 1236 profiles.Additional consumer data, such as gender, hometown, and age, can beextracted from the social media 1236 content of the social media 1236profiles.

Social media 1236 content can include TWITTER TWEETS and FACEBOOKupdates and wall posts, to name a few non-limiting examples. As oneexample, a TWITTER user may TWEET, “Today, just bought Canon 24-105 mmf/4 L-series lens.” The text of the TWEET can be extracted, stored, andprocessed as consumer data, and additional consumer data can beextracted from the TWEET. For instance, a consumer data field forpresence on social networks may be filled with a positive value, and/orwith another value indicative of the consumer's presence on TWITTER. Aconsumer behavior field may be populated with the brand of the purchase,“Canon.”

Social media 1236 profiles can include, for example, a user profile pageon FACEBOOK, TWITTER, or LinkedIn. Such profiles often contain consumerdata such as gender, residential address, alternative names, employers,age, e-mail addresses, and/or links to personal websites.

The additional consumer data can be acquired from both the consumer datasupplier 1230 and the Internet 1228. In some cases, additional consumerdata on certain consumers will only be available from the consumer datasupplier 1230, while others will only be available from the Internet1228. Thus, these two sources can be used to supplement gaps in eachother. For instance, a consumer data supplier 1230 may provide genderfor one or more consumers, but the gender of a few consumers may beunknown. The acquire consumer data operation 1202 can look to socialmedia content 1236 on the Internet 1228, such as TWITTER accountprofiles or LinkedIn account profiles, to fill in the gaps. In anotherexample, a consumer data supplier 1230 may provide consumer affinitybased on analysis of consumer credit card histories, but there may begaps for certain consumers where the credit card histories did notprovide sufficient detail. Additional consumer data extracted from theInternet 1228 can supplement the data supplied by the consumer datasupplier 1230 and can potentially fill in the gaps in that set of data.

Consumer data can be acquired via a number of different means. In anembodiment, purchase histories can be gleaned from data on credit cardusage. Consumer affinity for a brand can be extracted from the text ofreviews that a consumer posts on social media and retail websites.Behavior on social networks can be extracted from a consumer's publicactions such as ‘liking’ a product on FACEBOOK. These are just a fewnon-limiting examples of how and where consumer data can be extracted.

The initial consumer data and the additional consumer data can pass to aconsumer data database 1234 for temporary or long-term storage. Thedatabase 1234 can take the form of a database written to and accessed onmemory (e.g., RAM, cache, SDD, HDD) of one or more computing devices(e.g., one or more remote servers). However, the database 1234 can alsomerely represent a logical state of the consumer data en route to thedata selection operation 1204, and thus does not have to be written tomemory. In one embodiment, the acquire operation 1202 and the dataselection operation 1204 can operate in succession such that consumerdata is not written to a memory between the two operations.

The consumer data can also be cleansed before reaching the consumer datadatabase 1234, which means transforming the consumer data intostandardized formats to ease comparison. For instance, addresses may bestandardized in accordance with US Postal Service standards, or multiplerecords may be synthesized into a single contact, household orresidence. Consumer age and gender can be standardized by convertingthese attributes into ranges (e.g., 20-29, 30-39, and 40-41).

Data Selection

The data selection operation 1204 then selects a subset of the consumerdata so that the evaluation operation 1206 can operate on a smaller andmore organized set of data. This means selecting either the best data orthe best source of data. By best data it is meant that given multipledata values for a given field, the value most able to assist insegmenting consumers into different segments 1226 is the best datavalue. By best data source it is meant that given multiple sources ofconsumer data, one of those sources provides data that is consideredmost relevant in assisting the segmentation operation 1208. For example,a data value extracted from a blog post may be selected over oneprovided in a comment to a blog since the blog post is considered thebetter source (e.g., more reliable or more accurate).

In one embodiment, this means filtering based on criteria intended tofilter out data that is less relevant or useful. In another embodiment,the data selection operation 1204 can include a hierarchy of rules toselect the subset of consumer data 1222. The hierarchy of rules stepsthrough each rule eliminating data that meets (or fails) each rule. Forinstance, a first rule can ask whether any of the data sources arepublic records. If so, then the data value from the public record (orone of the public records) is selected over all others. If there is nopublic record source, then the hierarchy moves on to a second rule, andso on. If data for a given field has been reduced to a single value or asingle source, then the hierarchy of rules is complete and the dataselection operation 1204 can move on to a next field. In anotherembodiment, a mode or most common value for a data field can beselected. For example, FACEBOOK and TWITTER may both indicate that aconsumer is 32 years old, while LinkedIn indicates his/her age at 33.Since the most common value for the age is 32, the data from FACEBOOKand TWITTER can be used rather than that from LinkedIn. In anotherembodiment, an average or weighted average of values from differentsources can be used. For instance, TWITTER may be considered morereliable than FACEBOOK, and therefore when values from FACEBOOK andTWITTER are averaged, a slightly greater weight may be given to theTWITTER value.

Beyond mere selection of data, the data selection operation 1204 mayalso involve transformation of some or all of the consumer data.Transformation may mean extracting or isolating text or values fromcontent, such as extracting a brand-name from a TWEET. For instance, theoriginal consumer data may be a FACEBOOK update such as “Loving my newMacAir,” yet the data extracted from this may be the brand name “Mac” or“Apple.” In another example transformation may involve extracting datafrom a photograph, video, or audio file. For instance, facialrecognition of a photograph or video can indicate the names of peoplewho a consumer associates with and who the consumer may be more likelyto influence to make purchases.

Transformation may also include changing a format of the consumer datainto a format that is more uniform across the subset of consumer data1222. For instance, dates (e.g., birthdates and dates of purchases ordates of TWEETS) come in a variety of formats, and thus transformationmay include converting all dates into a common format, such as <month,day, year>. As another example, given transactional data that includespurchases made by consumers, transformation may include summarizing thistransaction data with a single total purchases value. In anotherexample, transaction data can be summarized in a value representing acustomer's lifetime value. As seen, the data selection operation 1204not only generates a subset of the consumer data 1222 that is smallerand thus more easily analyzed than the original consumer data, but thesubset 1222 is also organized in a fashion that eases analysis andspeeds the evaluation operation 1206.

The data selection operation 1204 can also include extract, transform,and load suboperations. Extraction involves selecting data from one ormore sources and extracting it into a memory. Transformation involvestransforming the data into forms that are more easily compared such asranges of data. Data can then be loaded into the Mart 1220 in the loadsuboperation.

In some embodiments, the consumer data and the subset of consumer data1222 can be stored or written to a memory 12, such that the dataselection operation 1204 can again operate on the consumer data at alater time, or when new consumer data arises.

The subset of consumer data 1222 can be provided to the mart 1220 or tothe evaluation operation 1206 or to both. The subset of consumer data1222 can be a flat set of data, meaning that values or variables in thesubset 1222 are one-dimensional.

Evaluation

The evaluation operation 1206 can be performed on the subset of consumerdata 1222 before or after the subset 1222 is provided to or stored inthe mart 1220. The evaluation operation 1206 analyzes the subset ofconsumer data 1222 and generates or updates derived fields 1224. Thederived fields 1224 can be added to or can be used to update existingfields 1224 in the mart 1220. For instance, the evaluation operation1206 can be performed on a nightly basis, thus nightly generating andupdating the derived fields 1224. The evaluation operation 1206 can alsoassign scores to each consumer for each of one or more derived fields1224. More details of the evaluation operation 1206 will be discussed inconjunction with the below description of the derived fields 1224.

Segmentation

Segmentation 1208 is a process of analyzing the derived fields 1224, andoptionally also the subset of consumer data 1222, in order to placeconsumers into segments 1226, where the segments 1226 can be used to aidclients in selecting consumers for marketing and for selecting means ofmarketing. Consumers can be placed into one or more segments 1226 basedon scores assigned to them for each of one or more derived fields 1224.For instance, where a client desires a segment 1226 of the mostinfluential consumers, the segment 1226 may be filled with all consumershaving a score of 5 (out of 5) in the derived field of influence.

One or more segments 1226 and lists or tables of consumers that areassigned to each segment 1226 can be provided to clients. In one casethis can be in response to a client request for either suggestedsegments 1226 or for segments 1226 selected by the client (e.g., segmentof most influential consumers). As the derived fields 1224 and thesubset of consumer data 1222 are updated, consumers can be moved in andout of the segments 1226—in other words, the segments 1226 can also beupdated.

The segments 1226 that exist or that are provided to a client can beautomatically selected or can be selected by a client inquiry. A clientinquiry can include desired segments 1226 or a description of a desiredtype of customer where the description can be represented by one or moresegments 1226.

Segments can include, but are not limited to, categories related toin/out of market status, demographics (e.g., age, gender, householdincome), psychographics (e.g., attributes relating to personality,values, attitudes, interests, lifestyles), or behavioral attributes(e.g., recency, frequency, monetary dimension, loyalty).

Once consumers are placed in segments 1226, the segments 1226 can beused to automatically trigger a promotion or other marketing material intrigger promotion operation 1210. The characteristics of the automatedpromotion can be predefined by a client. Triggering can result from oneor more consumers being added to or removed from a segment. Thepromotion may be sent via a variety of means including, but not limitedto, e-mail, mobile message (e.g., SMS), smartphone or tablet computerapp, online display advertising, direct mail (e.g., ‘snail mail’),telemarketing, or a point of sale means.

The segments 1226 can also be used to generate individual marketingsuggestions for each consumer in an individual suggestion operation 1214(e.g., consumer X is in-market and is looking for ping pong equipment).Suggestions can include who the marketing should be aimed at, what typeof marketing should be used, and when and where the marketing should bedisplayed or presented. Alternatively, a provide segments to clientoperation 1212 can provide one or more segments 1226 to a client basedon a description of the client's preferred consumers.

Consumer Data

Consumer data includes any data that can be used to assist in assigningconsumers to segments 1226. A few non-limiting categories or fields ofconsumer data include social network identifiers, presence on socialnetworks, behavior on social networks, content on social networks,contact data, demographics, consumer behavior, consumer affinity, remotesystem IDs, aggregated customer performance, and detailed transactionaldata.

The field of social network identifiers can include values indicatingdifferent social networks or means to access or link to social networks.For instance one identifier is a TWITTER handle while a FACEBOOK User IDis another identifier.

The field of presence on social networks (or other types of websites)can include binary values indicating whether a consumer has any presenceon any social network. The values may also indicate which socialnetworks the consumer has a presence on.

The field of behavior on social networks (or other types of websites)can be populated with values indicating a consumer's behavior such astime, date, and/or type of activity. Tweeting on TWITTER, updatingstatus on FACEBOOK, and liking a product or brand on FACEBOOK are threeexemplary types of activity that can be described as consumer data inthe field of behavior on social networks.

The field of content on social networks (or other types of websites) caninclude any user-generated content such as videos (e.g., FACEBOOK orYOUTUBE videos), photos, status updates, TWEETS, product reviews, blogposts, comments on blogs and articles, shared links, and other contentexpressing affinity for a brand, company, service, or product (e.g.,‘liking’ a company, brand, or service on FACEBOOK).

The field of contact data can include any data useful for establishingcommunication with a consumer such as an e-mail address, residentialaddress, phone number, URL, or IP address, to name a few.

Some non-limiting examples of data in the demographic field include age,gender, household income, education level, race, political affiliation,and marital status, to name a few.

Data in the consumer behavior field describes consumer behavior andactions outside the social media context. For instance, behaviors suchas purchases and purchases in a particular product category can all beconsumer data in the field of consumer behavior. More particularly,consumer behavior may indicate that a consumer predominately shops onAMAZON or most frequently purchases CANON and NIKON branded products.

The field of consumer affinity describes consumer preference for certainproducts, brands, and other categories of product. For instance,consumer affinity may comprise one or more indicators of a consumer'saffinity for CANON or photography products or even a preference for‘fast’ camera lenses versus slow camera lenses. Consumer affinity mayalso describe categories of lifestyle that a consumer fits into such asskier, dancer, shooter, gamer, outdoorsman, organic, conservative,adventurous, and placid, to name just a few non-limiting examples of theplethora of lifestyles and types of lifestyles that consumer affinitymay describe.

The field of remote system IDs can include Customer RelationshipManagement (CRM), e-mail service provider, or content management, toname a few. These IDs may be derived or extracted from other marketingsystems such as those of the consumer data provider 1230.

Also, the field of aggregated customer performance information caninclude values representative of a monetary value of purchases or anumber of purchases made over a period of time, to name two examples.Detailed transaction data can include, for instance, purchases, returns,and credits from an electric commerce (e-commerce) or retail point ofsale system.

Consumer data can often be extracted from user-generated Internetcontent 1228 such as TWEETS, FACEBOOK updates and FACEBOOK Interests,comments on blogs, and product reviews, to name a few. For instance, aconsumer may TWEET the following: “Today, just bought Canon 24-105 mmf/4 L-series lens . . . dig Canon . . . now for a macro lens.” Text fromthe TWEET can be extracted, stored, and processed as consumer data.Various fields can be extracted from this TWEET such as presence onsocial networks, behavior on social networks, and consumer affinity. Aconsumer data field for presence on social networks may be filled with apositive value, and/or with a value representing TWITTER. Anotherconsumer data field representing behavior on social networks can bepopulated with a positive value, and/or can include an indicator thatthe consumer engaged in a TWEET action and may include indicators of thetime and date of the TWEET. Consumer behavior can be extracted as a datapoint indicating the purchase of a CANON product or specifically thebrand and model. The consumer affinity field could here be populatedwith an indicator of photography equipment for digital single lensreflex (DSLR) cameras or for high-end or professional-level equipment.

Subset of Consumer Data

The subset of consumer data 1222 comprises some or all of the consumerdata that was stored in or passed through the consumer data database1234, and in particular is that selected by the data selection operation1204. The subset 1222 may include a smaller number of data points thanthe original set of consumer data as well as a more organized and moreuniformly formatted set of consumer data. The evaluation operation 1206can access some or all contents of the subset of consumer data 1222. Thesubset of consumer data 1222 can be stored or written to the mart 1220.

The subset of consumer data 1222 can be updated, for instance on aperiodic basis. Updating can involve storing new consumer data or newfields, or replacing existing consumer data. For instance, where newsources provide a more accurate estimate of a consumer's householdincome, the household income field in the subset of consumer data 1222may be updated.

Derived Fields

Each consumer can be associated with derived fields 1224 that aregenerated and updated via the evaluation operation 1206. Derived fieldsinclude data that has gone through analysis and transformation beyondmere selection of a best data value as carried out by the data selectionoperation 1204. For instance, some non-limiting examples of derivedfields include influence, breadth of digital footprint, reach, recency,frequency, in-market status, depth of consumer-to-consumerrelationships, and consumer value to a client. Derived fields are usedto ascertain actionable insight about a given consumer.

Derived fields 1224 can be multidimensional data sets. A consumer'soverall value to a client, can be based upon a multidimensional scorewhere each dimension is represented by a score for a given derivedfield. Different clients may desire different types of consumers, andthus each client can indicate which dimensions are favored whenevaluating a set of consumers.

Each derived field for each consumer can be associated with a score asdetermined in the evaluation operation 1206. For instance, in the fieldof influence, ten consumers may be assigned a score of 3 out of 5, twomay be assigned scores of 4 out of 5, and one may be assigned a score of5 out of 5.

Influence can include a consumer's propensity to influence otherconsumers to achieve a particular behavior (e.g., causing anotherconsumer to make a purchase). For instance, a consumer may TWEET “Justbought Canon 24-105 mm f/4 L-series lens . . . dig Canon . . . now for amacro lens.” The evaluation operation 1206 can track consumers that areassociated with this user and determine how many of those otherconsumers made similar purchases within a reasonable time of this TWEET.Where a substantial number of related consumers purchase CANON cameraproducts within a month of this TWEET, the consumer's influence scoremay be high. Where no related consumers make a CANON purchase within amonth of this TWEET, the consumer's influence score may be low.

Influence can include data describing one or more of the following:number of networks on which the consumers can be confirmed; number ofconsumers connected to the given consumer on each network; volume ofupdates by the consumer on each network; content, interaction type, andtimestamp of updates for each network; volume of interactions by theconsumer's connections with said consumer's updates on each network;content, interaction type, and timestamp of interactions for eachnetwork made by the consumer's connections on each network; andfrequency of interactions by consumer's connections.

Influence can be derived from quantitative and/or qualitative aspects ofcontent (e.g., social media content). For instance, influence may bederived from a number of TWEETS, but also from content in TWEETSindicating that a user's friends respond positively to the user'sproduct recommendations. In one embodiment, influence can include or bederived from at least three components. The first component can comprisetopical terms associated with an organization or company's products,product categories, services, service categories, context of productuse, or other topics of inters. The second component can include asentiment that establishes parameters for positive, negative, or neutralfeelings toward the topical objects. The third component can be anintention that represents how likely content suggests that a purchasingdecision is imminent.

Reach can include a reach of a consumer's influence. In particular,reach can include data describing a number of networks on which theconsumer can be confirmed and a number of consumers connected to theconsumer on each network.

Recency can include one or more indicators of a time of consumerbehavior (e.g., time since last purchase or time since last TWEET). Inparticular, recency can include data describing one or more of thefollowing: volume of updates by the consumer for each network; content,interaction type, and timestamp of updates on each network; volume ofinteractions by the consumer's connections with said consumer's updateson each network; content, interaction type, and timestamps ofinteractions for each network made by the consumer's connections on eachnetwork; and frequency of interactions by the consumer's connections.

Frequency can include an indicator describing the number of behaviors oractions in a given time that a consumer engages in (e.g., number ofmonthly Amazon purchases or percentage of purchases for which theconsumer creates an online customer review for the product).Particularly, frequency can include data describing one or more of thefollowing: volume of updates by the consumer on each network; content,interaction type, and timestamp of updates on each network; volume ofinteractions by the consumer's connections with said consumer's updateson each network; content, interaction type, and timestamp ofinteractions for each network made by the consumer's connections on eachnetwork; and frequency of interactions by the consumer's connections.

Footprint can include an indicator describing the number of networks onwhich a consumer has account activity. Account activity can be detectedor inferred. Inferred activity can be based on demographics andbehavior. Footprint can include data describing one or more of thefollowing: number of networks on which the consumer can be confirmed;consumer demographic attributes including age, gender, household income,employer, occupation and location of primary residence.

In-market status can describe a consumer's likelihood of purchasing aspecific product or purchasing from a specific product category, and caninclude a likelihood as a function of time (e.g., in-market status forthe next 2 days versus for the next month). Returning to the CANON lensTWEET example above, the TWEET includes the language “now for a macrolens,” which may indicate that the consumer is in the market for anotherlens. In contrast, if the consumer posted the following text, “Ford GTnow in the garage . . . No need to buy another car ever!,” it might beinterpreted as indicating little to no interest in making a furtherpurchase thus resulting in a lower in-market score. Breadth, depth,recency, and frequency can also be related to these characteristics onsocial networks (e.g., recency of consumer behavior on social networks).

In-market status can include data describing the following: volume ofupdates by the consumer for each network; content, interaction type, andtimestamp of updates for each network; volume of interactions by theconsumer's connections with the consumer's updates on each network;content, interaction type, and timestamp of interactions for eachnetwork made by the consumer's connections on each network.

Each of these derived fields 1224 can also constitute a dimension ofderived fields 1224 associated with a consumer such that each consumercan be represented by a multidimensional field or vector. Furthermore,each derived field 1224, or dimension of the multidimensional field orvector, can have a weight, such that some fields or dimensions have agreater influence on what segment 1226 a consumer is placed into thanothers. For instance, influence can be weighted more heavily than reach.Alternatively, different social media or Internet sources can havedifferent weights (e.g., FACEBOOK would likely be more heavily weightedthan MYSPACE in 2012).

Derived fields 1224 can each include a score, quantifying the field asit relates to each consumer. These scores can be based on comparison ofconsumers. For instance, a consumer having greater influence thananother consumer can have a higher influence score. To compareconsumers, indexes can be assigned to each consumer in each of one ormore derived fields 1224, and a score can be derived from the index thatindicates a consumer's value in a field relative to other consumers. Forinstance, a first consumer may have an index for influence of 52 while asecond consumer has an index of 98. Based on these scores, the secondconsumer would be assigned a higher score for influence than the firstconsumer.

In an embodiment, a baseline index is set for all consumers. Eachconsumer receives an index, which can be used to determine adistribution of consumers relative to the baseline. The distribution caninclude one or more breaks separating regions of the distribution inwhich consumers are assigned the same score. For instance, indexes mayrange from 0 to 100 (a baseline of 50), with each 20 points in indexcorresponding to a different score (e.g., index range=score: 0-20=1,21-40=2, 41-60=3, 61-80=4, 81-100=5). The method of determining indexescan be such that the majority of consumers receive a score of 2, 3, or4, while only a handful of scores of 1 or 5 are assigned. In otherembodiments, the ranges of indexes can be unequal (e.g., indexrange=score: 0-10=1, 11-30=2, 31-70=3, 61-80=4, 81-100=5). While derivedfields 1224 do not need to be assigned scores, assigned scores can makethe segmentation 1208 faster, less complex, and more consistent.

Scores can represent a value of a consumer in a given derived field1224. For instance, in the field of in-market status, there may be fivescores (1-5). A 0 may be assigned where not enough information exists todetermining the likelihood of being in-market. A score of 1 may beassigned to those consumers least likely to be in-market, a 2 to thosesomewhat likely to be in-market. A 3 can be assigned to those moderatelylikely to be in-market. Scores of 4 and 5 are assigned to consumers whoare more likely to be in-market and most likely to be in-market,respectively. These are just a few examples scores and their meaning.

In an embodiment, an index can be derived based on a number of subscoresor data. For instance, an index for in-market status can be assignedbased on factors such as the volume of updates by the consumer in eachnetwork and the volume of interactions by the consumer's connectionswith the consumer's updates on each network. Each factor considered indetermining an index can have a weight, where the weight determines afactor's importance in assigning an index.

The Mart

The mart 1220 can include the subset of consumer data 1222 and thederived fields 1224. It may consist of a memory residing on a computingdevice or distributed among two or more computing devices (e.g., remoteservers). The subset of consumer data 1222 and the derived fields 1224can be stored in or written to the mart 1220 or can logically residewithin the mart 1220.

FIG. 13 illustrates another arrangement of logical components andfunctions for marketing based on social media content and other Internetcontent. In this embodiment a load file operation 1302 loads a file to acomputing system, where the file includes at least a consumer identifier(e.g., e-mail address, telephone number, residential mailing address).The consumer identifier is provided to a consumer identifier stagingarea 1304 where the consumer identifier can be formatted to easereadability and can be access for further processing. Calls 1306 arethen made to consumer data providers who return consumer data that isassociated with the consumer identified by the consumer identifier. Thisadditional consumer data can be passed to a consumer data staging area1308, where the data is prepared for further analysis, integration, andorganization.

An extract, transform, and load operation 1310 (or data selection) thenextracts the consumer data from the consumer data staging area 1308,transforms those portions of the data that need transforming into a formthat is more easily analyzed and organized, and loads this transformedconsumer data into a normalized mart 1312. The normalized mart 1312includes a subset of the consumer data where the subset comprises dataof common forms that are easily compared, searched, and analyzed.

Evaluation 1314 can then be performed on the subset of consumer data inthe normalized mart 1312 to generate derived fields. The derived fieldscan be stored or written to the normalized mart 1312 and/or updatedwithin the normalized mart 1312. The derived fields and optionally thesubset of consumer data can then pass through a segmentation operation1316 where the consumers associated with the consumer data are assignedsegments (e.g., influence, in-market status, recency, frequency, etc.).The segments can then be passed to a client in a provide operation 1318or can trigger a promotion 1320 where the promotion and the triggeringmechanisms can be selected by a client.

The systems and methods described herein can be implemented in a machinesuch as a computer system in addition to the specific physical devicesdescribed herein. FIG. 14 shows a diagrammatic representation of oneembodiment of a machine in the exemplary form of a computer system 1400within which a set of instructions can execute for causing a device toperform or execute any one or more of the aspects and/or methodologiesof the present disclosure. The components in FIG. 14 are examples onlyand do not limit the scope of use or functionality of any hardware,software, embedded logic component, or a combination of two or more suchcomponents implementing particular embodiments.

Computer system 1400 may include a processor 1401, a memory 1403, and astorage 1408 that communicate with each other, and with othercomponents, via a bus 1440. The bus 1440 may also link a display 1432,one or more input devices 1433 (which may, for example, include akeypad, a keyboard, a mouse, a stylus, etc.), one or more output devices1434, one or more storage devices 1435, and various tangible storagemedia 1436. All of these elements may interface directly or via one ormore interfaces or adaptors to the bus 1440. For instance, the varioustangible storage media 1436 can interface with the bus 1440 via storagemedium interface 1426. Computer system 1400 may have any suitablephysical form, including but not limited to one or more integratedcircuits (ICs), printed circuit boards (PCBs), mobile handheld devices(such as mobile telephones or PDAs), laptop or notebook computers,distributed computer systems, computing grids, or servers.

Processor(s) 1401 (or central processing unit(s) (CPU(s))) optionallycontains a cache memory unit 1402 for temporary local storage ofinstructions, data, or computer addresses. Processor(s) 1401 areconfigured to assist in execution of computer readable instructions.Computer system 1400 may provide functionality as a result of theprocessor(s) 1401 executing software embodied in one or more tangiblecomputer-readable storage media, such as memory 1403, storage 1408,storage devices 1435, and/or storage medium 1436. The computer-readablemedia may store software that implements particular embodiments, andprocessor(s) 1401 may execute the software. Memory 1403 may read thesoftware from one or more other computer-readable media (such as massstorage device(s) 1435, 1436) or from one or more other sources througha suitable interface, such as network interface 1420. The software maycause processor(s) 1401 to carry out one or more processes or one ormore steps of one or more processes described or illustrated herein.Carrying out such processes or steps may include defining datastructures stored in memory 1403 and modifying the data structures asdirected by the software.

The memory 1403 may include various components (e.g., machine readablemedia) including, but not limited to, a random access memory component(e.g., RAM 1404) (e.g., a static RAM “SRAM”, a dynamic RAM “DRAM, etc.),a read-only component (e.g., ROM 1405), and any combinations thereof.ROM 1405 may act to communicate data and instructions unidirectionallyto processor(s) 1401, and RAM 1404 may act to communicate data andinstructions bidirectionally with processor(s) 1401. ROM 1405 and RAM1404 may include any suitable tangible computer-readable media describedbelow. In one example, a basic input/output system 1406 (BIOS),including basic routines that help to transfer information betweenelements within computer system 1400, such as during start-up, may bestored in the memory 1403.

Fixed storage 1408 is connected bidirectionally to processor(s) 1401,optionally through storage control unit 1407. Fixed storage 1408provides additional data storage capacity and may also include anysuitable tangible computer-readable media described herein. Storage 1408may be used to store operating system 1409, EXECs 1410 (executables),data 1411, API applications 1412 (application programs), and the like.Often, although not always, storage 1408 is a secondary storage medium(such as a hard disk) that is slower than primary storage (e.g., memory1403). Storage 1408 can also include an optical disk drive, asolid-state memory device (e.g., flash-based systems), or a combinationof any of the above. Information in storage 1408 may, in appropriatecases, be incorporated as virtual memory in memory 1403.

In one example, storage device(s) 1435 may be removably interfaced withcomputer system 1400 (e.g., via an external port connector (not shown))via a storage device interface 1425. Particularly, storage device(s)1435 and an associated machine-readable medium may provide nonvolatileand/or volatile storage of machine-readable instructions, datastructures, program modules, and/or other data for the computer system1400. In one example, software may reside, completely or partially,within a machine-readable medium on storage device(s) 1435. In anotherexample, software may reside, completely or partially, withinprocessor(s) 1401.

Bus 1440 connects a wide variety of subsystems. Herein, reference to abus may encompass one or more digital signal lines serving a commonfunction, where appropriate. Bus 1440 may be any of several types of busstructures including, but not limited to, a memory bus, a memorycontroller, a peripheral bus, a local bus, and any combinations thereof,using any of a variety of bus architectures. As an example and not byway of limitation, such architectures include an Industry StandardArchitecture (ISA) bus, an Enhanced ISA (EISA) bus, a Micro ChannelArchitecture (MCA) bus, a Video Electronics Standards Association localbus (VLB), a Peripheral Component Interconnect (PCI) bus, a PCI-Express(PCI-X) bus, an Accelerated Graphics Port (AGP) bus, HyperTransport(HTX) bus, serial advanced technology attachment (SATA) bus, and anycombinations thereof.

Computer system 1400 may also include an input device 1433. In oneexample, a user of computer system 1400 may enter commands and/or otherinformation into computer system 1400 via input device(s) 1433. Examplesof an input device(s) 1433 include, but are not limited to, analpha-numeric input device (e.g., a keyboard), a pointing device (e.g.,a mouse or touchpad), a touchpad, a joystick, a gamepad, an audio inputdevice (e.g., a microphone, a voice response system, etc.), an opticalscanner, a video or still image capture device (e.g., a camera), and anycombinations thereof. Input device(s) 1433 may be interfaced to bus 1440via any of a variety of input interfaces 1423 (e.g., input interface1423) including, but not limited to, serial, parallel, game port, USB,FIREWIRE, THUNDERBOLT, or any combination of the above.

In particular embodiments, when computer system 1400 is connected tonetwork 1430, computer system 1400 may communicate with other devices,specifically mobile devices and enterprise systems, connected to network1430. Communications to and from computer system 1400 may be sentthrough network interface 1420. For example, network interface 1420 mayreceive incoming communications (such as requests or responses fromother devices) in the form of one or more packets (such as InternetProtocol (IP) packets) from network 1430, and computer system 1400 maystore the incoming communications in memory 1403 for processing.Computer system 1400 may similarly store outgoing communications (suchas requests or responses to other devices) in the form of one or morepackets in memory 1403 and communicated to network 1430 from networkinterface 1420. Processor(s) 1401 may access these communication packetsstored in memory 1403 for processing.

Examples of the network interface 1420 include, but are not limited to,a network interface card, a modem, and any combination thereof. Examplesof a network 1430 or network segment 1430 include, but are not limitedto, a wide area network (WAN) (e.g., the Internet, an enterprisenetwork), a local area network (LAN) (e.g., a network associated with anoffice, a building, a campus or other relatively small geographicspace), a telephone network, a direct connection between two computingdevices, and any combinations thereof. A network, such as network 1430,may employ a wired and/or a wireless mode of communication. In general,any network topology may be used.

Information and data can be displayed through a display 1432. Examplesof a display 1432 include, but are not limited to, a liquid crystaldisplay (LCD), an organic liquid crystal display (OLED), a cathode raytube (CRT), a plasma display, and any combinations thereof. The display1432 can interface to the processor(s) 1401, memory 1403, and fixedstorage 1408, as well as other devices, such as input device(s) 1433,via the bus 1440. The display 1432 is linked to the bus 1440 via a videointerface 1422, and transport of data between the display 1432 and thebus 1440 can be controlled via the graphics control 1421.

In addition to a display 1432, computer system 1400 may include one ormore other peripheral output devices 1434 including, but not limited to,an audio speaker, a printer, and any combinations thereof. Suchperipheral output devices may be connected to the bus 1440 via an outputinterface 1424. Examples of an output interface 1424 include, but arenot limited to, a serial port, a parallel connection, a USB port, a FIREWIRE port, a THUNDERBOLT port, and any combinations thereof.

In addition or as an alternative, computer system 1400 may providefunctionality as a result of logic hardwired or otherwise embodied in acircuit, which may operate in place of or together with software toexecute one or more processes or one or more steps of one or moreprocesses described or illustrated herein. Reference to software in thisdisclosure may encompass logic, and reference to logic may encompasssoftware. Moreover, reference to a computer-readable medium mayencompass a circuit (such as an IC) storing software for execution, acircuit embodying logic for execution, or both, where appropriate. Thepresent disclosure encompasses any suitable combination of hardware,software, or both.

Those of skill in the art would understand that information and signalsmay be represented using any of a variety of different technologies andtechniques. For example, data, instructions, commands, information,signals, bits, symbols, and chips that may be referenced throughout theabove description may be represented by voltages, currents,electromagnetic waves, magnetic fields or particles, optical fields orparticles, or any combination thereof.

Those of skill would further appreciate that the various illustrativelogical blocks, modules, circuits, and algorithm steps described inconnection with the embodiments disclosed herein may be implemented aselectronic hardware, computer software, or combinations of both. Toclearly illustrate this interchangeability of hardware and software,various illustrative components, blocks, modules, circuits, and stepshave been described above generally in terms of their functionality.Whether such functionality is implemented as hardware or softwaredepends upon the particular application and design constraints imposedon the overall system. Skilled artisans may implement the describedfunctionality in varying ways for each particular application, but suchimplementation decisions should not be interpreted as causing adeparture from the scope of the present invention.

The various illustrative logical blocks, modules, and circuits describedin connection with the embodiments disclosed herein may be implementedor performed with a general purpose processor, a digital signalprocessor (DSP), an application specific integrated circuit (ASIC), afield programmable gate array (FPGA) or other programmable logic device,discrete gate or transistor logic, discrete hardware components, or anycombination thereof designed to perform the functions described herein.A general purpose processor may be a microprocessor, but in thealternative, the processor may be any conventional processor,controller, microcontroller, or state machine. A processor may also beimplemented as a combination of computing devices, e.g., a combinationof a DSP and a microprocessor, a plurality of microprocessors, one ormore microprocessors in conjunction with a DSP core, or any other suchconfiguration.

The steps of a method or algorithm described in connection with theembodiments disclosed herein may be embodied directly in hardware, in asoftware module executed by a processor, or in a combination of the two.A software module may reside in RAM memory, flash memory, ROM memory,EPROM memory, EEPROM memory, registers, hard disk, a removable disk, aCD-ROM, or any other form of storage medium known in the art. Anexemplary storage medium is coupled to the processor such the processorcan read information from, and write information to, the storage medium.In the alternative, the storage medium may be integral to the processor.The processor and the storage medium may reside in an ASIC. The ASIC mayreside in a user terminal. In the alternative, the processor and thestorage medium may reside as discrete components in a user terminal.

The previous description of the disclosed embodiments is provided toenable any person skilled in the art to make or use the presentinvention. Various modifications to these embodiments will be readilyapparent to those skilled in the art, and the generic principles definedherein may be applied to other embodiments without departing from thespirit or scope of the invention. Thus, the present invention is notintended to be limited to the embodiments shown herein but is to beaccorded the widest scope consistent with the principles and novelfeatures disclosed herein.

What is claimed is:
 1. A method for generating reports enhancing anunderstanding of Internet users based on their generated content andactions taken by others in response to the generated content, the methodcomprising: collecting content or other raw data via a crawler modulethat accesses webpages; associating the content or other raw data with asocial profile residing in or being added to a memory; calculatingscores for the social profile based on terms parsed from the content orother raw data, wherein the scores are based at least in part on anumber and quality of relationships between a user associated with thesocial profile and other users; receiving a query, via the networkinterface, for users fitting one or more contexts; identifying socialprofiles fitting the one or more contexts; and returning a report inresponse to the query comprising the social profiles fitting the one ormore contexts wherein an order of the social profiles is based on thescores.
 2. The method of claim 1, wherein the calculating scores isperformed based on one or more of reach, relevance, and impact scoresfor each of the terms parsed from the content or other raw data.
 3. Themethod of claim 2, wherein: the relevance scores represents a quantityof content generated over a period of time that is germane to one of theterms and how relevant the content is to the one of the terms; the reachscores represents a number of relationships between a user and otherusers as well as a quality of those relationships; and the impact scoresrepresents a number and quality of actions taken in response to a user'spublication of content on the Internet.
 4. The method of claim 1,further comprising: receiving another query for contexts fitting one ormore users; identifying the contexts fitting the one or more users; andreturning another report in response to the another query having thecontexts.
 5. The method of claim 1, further comprising generating agraph datastore including nodes representing users and including edgesbetween the nodes, where the edges represent relationships between theusers, and where the edges are weighted based on a quality of eachrelationship.
 6. The method of claim 5, further comprising adding a nodeto the graph datastore for every URL crawled.
 7. The method of claim 5,wherein the relationships are identified based on (1) explicit linksbetween users and (2) user actions that imply relationship.
 8. A methodfor generating reports enhancing an understanding of Internet usersbased on their generated content and actions taken by others in responseto the generated content, the method comprising: seeding a crawlermodule; crawling the Internet to find new users that can be used tocreate new social profiles and crawling the Internet to populate andupdate existing social profiles, the crawling based on the seeding;parsing content or other raw data generated by the first and secondcrawling, into terms; associating the terms and the content or other rawdata with the existing social profiles and the new social profiles;computing scores for each term, based on one or more of reach,relevance, and impact, where: reach is based on a number ofrelationships between a user associated with a social profile and otherusers as well as a weight assigned to each relationship, relevance isbased upon a quantity of content generated relative to a term over aperiod of time that is germane to the term and how relevant each pieceof content is to the term, and impact is based on a number and qualityof events triggered by content generated by a user relative to one ormore terms; associating the scores for each term with the existingsocial profiles and the new social profiles; receiving a query via thenetwork interface for a ranking of social profiles for one or morecontexts, each context describing a combination of any one or moreterms; generating contextual scores based on the scores and the one ormore contexts; returning a report via the network interface, the reportincluding one or more social profiles matching the one or more contextsand ranked in terms of the contextual scores.
 9. The method of claim 8,further comprising generating a graph datastore including nodesrepresenting users and including edges between the nodes, where theedges represent relationships between the users, and where the edges areweighted based on a quality of each relationship.
 10. The method ofclaim 9, further comprising adding a node to the graph datastore forevery URL crawled.
 11. The method of claim 9, wherein reach is based ona number of nodes that a given node is connected to as well as a weightassigned to the edges between the given node and nodes connected to thegiven node.
 12. The method of claim 8, wherein a marketing clientprovides customer data used in the seeding and the customer data is usedto generate the existing social profiles.
 13. The method of claim 8,wherein the seeding is based at least in part on a query including oneor more contexts.
 14. A method for generating reports enhancing anunderstanding of Internet users based on their generated content andactions taken by others in response to the generated content, the methodcomprising: providing, via an API, a query for users fitting a context,the context describing products, services, and/or a type of customer;providing customer data describing existing customers; creating firstexisting social profiles based on the customer describing existingcustomers; generating URLs from the context; seeding a crawler modulewith the URLs; crawling the Internet to create first new social profilesand to update and populate the first existing social profiles, thecrawling based on the URLs, the first new social profiles and the firstexisting social profiles together referred to as second existing socialprofiles; extracting additional URLs from content and metadata returnedvia the crawling; seeding the crawler module with the additional URLs;crawling the Internet to create second new social profiles and to updateand populate the second existing social profiles, the crawling based onthe additional URLs, the second new social profiles and the secondexisting social profiles together referred to as third existing socialprofiles; associating terms parsed from the content and metadatareturned by the first and second crawling with the third existing socialprofiles; computing scores for each term for the context, and assigningthe scores to associated social profiles in the third existing socialprofiles, the scores reflecting one or more of a reach, relevance, andimpact for the context; returning a report via the network interface,the report comprising the social profiles fitting the context wherein anorder of the social profiles is based on one or more of the reach,relevance, and impact for the context.
 15. The method of claim 15,wherein the scores are based at least in part on a number and quality ofrelationships between a user associated with the at least one of thethird existing social profiles and other users.
 16. The method of claim15, wherein the relationships are identified based on (1) explicit linksbetween users and (2) user actions that imply relationships.
 17. Themethod of claim 15, wherein the computing is based on a graph datastoreincluding nodes representing users, and includes edges between thenodes, where the edges represent relationships between the users, andwherein the edges are weighted based on a quality of each relationship.18. The method of claim 17, wherein the crawler module adds a node tothe graph datastore for every URL crawled.
 19. The method of claim 17,wherein a weight assigned to an edge is greater for an edge generatedbased on an explicit link between users than an edge generated based onuser actions that imply a relationship between users.
 20. The method ofclaim 15, wherein: the relevance represents a quantity of contentgenerated over a period of time that is germane to one of the terms andhow relevant the content is to the one of the terms; the reachrepresents a number of relationships between a user and other users aswell as a quality of those relationships; and the impact represents anumber and quality of actions taken in response to a user's publicationof content on the Internet.